reference distribution
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
- (3 more...)
- North America > United States > Virginia (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.74)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (18 more...)
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (5 more...)
- Consumer Products & Services > Restaurants (1.00)
- Energy (0.68)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (7 more...)
- North America > United States (0.14)
- Asia > Macao (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Hong Kong (0.04)
RealStats: A Rigorous Real-Only Statistical Framework for Fake Image Detection
As generative models continue to evolve, detecting AI-generated images remains a critical challenge. While effective detection methods exist, they often lack formal interpretability and may rely on implicit assumptions about fake content, potentially limiting robustness to distributional shifts. In this work, we introduce a rigorous, statistically grounded framework for fake image detection that focuses on producing a probability score interpretable with respect to the real-image population. Our method leverages the strengths of multiple existing detectors by combining training-free statistics. We compute p-values over a range of test statistics and aggregate them using classical statistical ensembling to assess alignment with the unified real-image distribution. This framework is generic, flexible, and training-free, making it well-suited for robust fake image detection across diverse and evolving settings.
- Information Technology > Security & Privacy (1.00)
- Law > Criminal Law (0.83)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Universal Sample Coding
In this work, we study the problem of communicating multiple samples from an unknown probability distribution using as few bits as possible. This is a generalization of the channel simulation problem, which has recently found applications and achieved state of the art results in realistic image compression, neural network compression, and communication-efficient federated learning. In this problem, the transmitter wants the receiver to generate multiple independent and identically distributed (i.i.d.) samples from a target distribution $P$, while the transmitter and the receiver have access to independent samples from a reference distribution $Q$. The core idea is to employ channel simulation in multiple rounds while updating the reference distribution $Q$ after each round in order to reduce the KL-divergence between $P$ and $Q$, thereby reducing the communication cost in subsequent rounds. We derive a lower bound on the expected communication cost and construct a practical algorithm that achieves the lower bound up to a multiplicative constant. We then employ this algorithm in communication-efficient federated learning, in which model updates correspond to samples from a distribution, and achieve a 37% reduction in the communication load. To further highlight the potential of sample communication for generative models, we show that the number of bits needed to communicate samples from a large language model can be reduced by up to 16 times, compared to entropy-based data compression.